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INTRODUCTION 


The  goal  of  the  University  of  Alabama  at  Birmingham  Interdisciplinary  Breast  Cancer  Training 
Program  (IBCTP)  is  to  educate  and  train  predoctoral  students  in  a  multidisciplinary  environment 
with  a  focus  on  breast  cancer  research.  The  aims  are  to  1)  recruit  predoctoral  trainees  to  the 
IBCTP;  2)  assure  that  predoctoral  trainees  obtain  a  broad-based  breast  cancer  education  and 
carry  out  interdisciplinary  breast  cancer  research;  3)  administer  this  program  with  sufficient 
oversight  to  ensure  high-quality  education  and  training,  efficient  completion  of  degree 
requirements,  and  productive  research  careers.  Our  training  program  is  designed  to  prepare  and 
motivate  trainees  to  pursue  careers  in  the  fields  of  breast  cancer  causation,  prevention,  diagnosis, 
therapy  and  education. 


BODY 

The  executive  committee  consist  of:  Dr.  Danny  Welch  (Mechanisms  of  Growth  Control),  Dr. 
Therese  Strong  (Gene  Therapy),  Robert  B.  Diasio  (Cancer  Pharmacology),  Clinton  Grubbs 
(Chemoprevention),  Charles  N,  Falany  (Cancer  Causation),  and  Dr.  Coral  A.  Lamartiniere 
(Program  Director),  plus  one  elected  student/trainee,  Tim  Whitsett.  The  executive  committee  is 
responsible  for  interviewing  and  selecting  prospective  IBCTP  students,  developing  and 
implementing  the  academic  and  research  program,  review  of  individual  student  progress,  the 
budget,  and  participating  in  Quarterly  and  Annual  Program  reviews. 

TASKS  FOR  YEAR  FIVE  (No  Cost  Extension  9/04  -  8/05) 

1)  Schedule  IBCTP  seminar  speakers  (Aim  2). 

The  APPENDIX  contains  the  list  of  breast  cancer  seminar  speakers  for  04  -  05.  (pages  13  and 
14). 

2)  Hold  quarterly  program  reviews  (Aim  3). 

Quarterly  program  reviews  were  held  by  the  executive  committee  to  discuss  recruitment,  the 
progress  of  the  trainees,  the  curriculum  and  the  evaluation  of  courses.  One  new  student  was 
recruited:  Scharri  Ezell. 

3)  Monitor  progress  of  trainees  (Aim  3). 

At  the  quarterly  meetings,  progress  of  individual  students  was  discussed.  At  the  end  of  the 
summer  meeting,  laboratory  evaluations  turned  in  by  the  mentors  were  taken  into  consideration. 
One  of  last  year’s  first  year  students  made  satisfactory  progress  academically  (A  &  B  grades) 
and  has  selected  a  research  mentor:  Sarah  Jenkins  with  Dr.  Coral  Lamartiniere  (Cancer 
Causation  and  Regulation).  Heath  McCorkle  dropped  out  of  the  program  because  of  health 
problems.  His  fiancee  died  in  an  auto  academic  and  Heath  has  suffered  severe  depression.  He  is 
under  doctors’  care.  A  list  of  students,  research  topic  and  mentors  is  provided  in  the  APPENDIX 
(page  12). 

4)  Scientific  Meetings  and  Abstracts  (for  all  years) 

Craig  Rowell  and  Hope  Amm  attended  and  made  poster  presentations  at  the  2005  AACR 
meeting  in  San  Francisco. 
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Craig  Rowell  attended  and  presented  at  2005  Biostatics  and  Cancer  meeting  in  Auburn  AL. 

Tim  Whitsett  attended  and  presented  at  the  2005  Gordon  Conference  on  Hormone  Action  in 
Development  and  Cancer,  and  the  2005  Society  of  Toxicology  Meeting  in  New  Orleans. 

James  Cody  attended  and  presented  at  the  2005  American  Society  of  Gene  Therapy 
Meeting  in  St.  Louis,  and  at  UAB’s  2005  Student  Research  Day. 

April  Adams  attended  the  2004  AACR  Special  Conference  on  Chromatin,  Chromosomes  and 
Cancer  Epigenetics,  in  Waikoloa,  Hawaii. 

Sarah  Jenkins  attended  the  2005  7th  International  Symposium  on  Mass  Spectrometry  in  the 
Health  and  Life  Sciences  (San  Francisco,  CA),  the  2005  Breast  Cancer  and  Environment  Center 
meeting  in  Princeton  and  2005  Clinical  Proteomics  Workshop:  Today  &  Tomorrow  (Nashville, 
TN  -  Vanderbilt  University). 

The  PI  attended  and  presented  at  the  2005  AACR  meeting  in  San  Francisco,  2  Breast  Cancer  and 
the  Environment  meetings  in  Cincinnati  (2004)  and  Princeton  (2005),  and  the  2005  Society  of 
Toxicology  meeting. 

A  list  of  student  abstracts/presentations  is  contained  in  Reportable  Outcomes. 

5)  Hold  annual  program  review  (Aim  3). 

At  the  end  of  the  summer  executive  committee  meeting,  the  following  recommendations  were 
made.  The  Breast  Cancer  Causation  and  Regulation  course  and  new  format  Breast  Cancer 
Seminar  Series  received  very  good  evaluations  and  it  was  recommended  that  the  contents  be  kept 
the  same.  A  copy  of  the  Breast  Cancer  Causation  and  Regulation  course  content  is  enclosed  in 
the  APPENDIX  (page  15). 

6)  Prepare  and  submit  final  report  to  DOD.  Submitted. 

KEY  ACCOMPLISHMENTS 

•  The  program  now  has  9  predoctoral  Breast  Cancer  students  in  good  academic  standing 
and/or  making  good  progress  in  breast  cancer  research  or  graduated. 

•  One  student  (Craig  Rowell)  has  completed  the  requirements  for  his  Ph.D.  and  has  started 
a  postdoc  at  Duke  University  with  a  continued  research  focus  in  breast  cancer.  Five 
(Damon  Bowe,  Tim  Whitsett,  James  Cody,  April  Adams,  Kevin  Roarty)  have  been 
accepted  into  Ph.D.  candidacy.  One  (Hope  Amm)  has  an  approved  Ph.D.  committee  and 
is  scheduling  her  qualifying  exam.  Sarah  Jenkins  has  identified  her  mentor  (Dr. 
Lamartiniere)  and  has  started  her  research.  Scharri  Ezell  is  a  minority  first  year  student 
taking  class  work  and  carrying  out  lab  rotations. 

•  For  academic  year  2004-2005,  with  only  carry  over  funds,  we  interviewed  2  applicants 
(from  20  completed  applications)  and  one  was  offered.  Ms.  Scharri  Ezell,  a  miniority 
student  was  accepted.  Her  stipend  and  tuition  is  being  paid  from  a  UAB  miniority 
fellowship. 
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•  The  appendix  contains  the  lectures  for  the  Breast  Cancer  Causation  and  Regulation 
course  (page  12).  The  2004-2005  course  received  a  “very  good”  evaluation. 


REPORTABLE  OUTCOMES 
•  Publications  (for  all  years) 

Whisenhunt,  T.W.,  Yang,  X.,  Bowe,  D.B.,  Paterson,  A.J.,  Toleman,  C.A.,  Kudlow,  J.E. 
“Escaping  Repression  at  Estrogen  Promoters:  Regulated  Coactivators  in  Repression  Complexes.” 
EMBO,  in  review. 

Bowe,  D.B.,  Yang,  X.*,  Muhkerjee,  S.,  Whisenhunt,  Rustgi,  A.K.,  Paterson,  A.P.,  Kudlow, 

J.E.:  “Groucho/TLEs  Repress  Wnt  Signalling  Via  O-GlcNAc  Transferase.”  Nature  Cell 
Biology,  in  submission. 

Bowe,  D.B.,  Adereth  Y.,  and  Maroulakou,  I.G.:  “ErbB2/Her-2  neu  promotes  mammary 
oncogenesis  via  reduction  of  p27k,pl  levels  in  cyclin  D1 -independent  manner.”  Oncogene,  in 
submission. 

Sadlonova,  A.,  Gault,  S.R.,  Dumas,  N.A.,  Bowe,  D.B.,  Van  Tine,  B.A.,  Mukheijee,  S.,  Novak,  L, 
Frost,  A.R.:  “Persistence  and  Growth-Inhibitory  Effect  of  Human  Breast  Fibroblasts  on  the 
MCF10AT  Xenograft  Model  of  Proliferative  Breast  Disease.”  Cancer  Research,  in  submission. 

Bowe,  D.B.,  Sadlonova  A.,  Toleman,  C.A.,  Hu,  Y.,  Paterson,  A.J.,  Kudlow,  J.E.:  “O-GlcNAc  is 
a  critical  regulator  of  nuclear  hormone  receptor  expression  in  mammary  gland  development.” 
Molecular  Cell  Biology,  in  submission. 

Bowe,  D.B.,  Sadonlova  A.,  Whiteside,  M.,  Frost,  A.R.,  Grizzle,  W.E.:  “CWR22  as  a  model  for 
androgen  sensitivity  and  androgen  resistance  of  prostate  cancer.”  Review  article.  (In 
preparation.) 

Rowell,  C.,  M.  Carpenter,  C.  A.  Lamartiniere,  “Modeling  Biological  Variability  in  2-D  gel 
Proteomic  Carcinogenesis  Experiments”  J.  Proteome  Res.;  2005;  ASAP  Web  Release  Date:  13- 
Aug-2005;  (Article)  DOI:  10.1021/pr0501261 

Rowell,  C.,  D.  Mark  Carpenter  and  Coral  A.  Lamartiniere.  “Chemoprevention  of  Breast  Cancer, 
Proteomic  Discovery  of  Genistein  Action  in  the  Rat  Mammary  Gland.”  Accepted  in  Journal  of 
Nutrition 


•  Abstracts  (for  all  years) 

Hope  M.  Amm,  Patsy  G.  Oliver,  Donald  J.  Buchsbuam.  TRA-8  anti-DR5  antibody  and 
chemotherapy  agents  produce  cytotoxicity  and  activate  apoptotic  pathways  in  breast  cancer 
cells.  (Abstract  #5357). 

Bowe,  D.B.,  Jones,  M.,  Page,  G.P.,  Allison,  D.B.,  and  Frost,  A.R.:  “Differences  in  gene 
expression  of  breast  carcinomas  of  pre-  and  post-menopausal  women."  Era  of  Hope  DOD  Breast 
Cancer  Research  Program  Meeting,  Orlando,  FL,  Sept.  25-28, 2002. 
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Bowe,  D.B.,  Jones,  M.,  Sadlonova,  A.,  Page,  G.P.,  Allison,  D.B.,  and  Frost,  A.R.:  “Age-related 
gene  expression  profiles  for  invasive  breast  carcinomas  in  pre-  and  post-menopausal  women.” 
Mammary  Gland  Biology,  Gordon  Research  Conference,  Bristol,  RI,  June  1-6,  2003. 

Whisenhunt,  T.W.,  Yang,  X„  Bowe,  D.B.,  Toleman,  C.A.,  Paterson,  A.J.,  Kudlow,  J.E. 
“Escaping  Repression  at  Estrogen  Promoters:  Regulated  Coactivators  in  Repression 
Complexes.”  Cambridge,  U.K.,  March  18-21,2004. 

Cody,  J.,  Lyons,  G.,  and  Douglas,  J.  A  Dual-Action  Armed  Replicating  Adenovirus  for  the 
Treatment  of  Bone  Metastases  of  Breast  Cancer.  Mol.  Ther.  9,  S370,  2004. 

Roarty,  K  and  Rosa  Serra.  WntSa  Exhibits  a  Growth  Inhibitory  Effect  on  Development  of  the 
Mammary  Gland,  American  Society  for  Cell  Biology  45th  Annual  Meeting,  San  Francisco,  CA 
2005. 

Rowell,  C,  Isbell,  S,  Desilva,  T  and  Lamartiniere,  CA.  2-Dimensional  gel  electrophoresis  and 
proteomic  identification  of  mammary  gland  proteins  of  rats  treated  with  the  soy  isoflavone, 
genistein.  Proceedings  of  the  American  Association  for  Cancer  Research.  43:35,  2002. 

Rowell,  C.,  Whitsett,  T.,  Carpenter,  M.  and  Lamartiniere,  C.A.  Proteomic  Analysis  of  Uterine 
Proteins  Following  Genistein  Exposure.  Proceedings  of  the  American  Association  for  Cancer 
Research.  44:  713,  2003. 

Carpenter,  M.,  Rowell,  C,  Lamartiniere,  C.  and  McCorkle,  H.,  “2D-gel  Proteomics  in  biomarker 
discovery.”  In  Proceedings  of  Pharmaceutical  Industry  S  AS  Users  Group  2004,  San  Diego, 
California. 

Rowell,  C.,  C.  Lamartiniere,  “Discovery  of  a  Novel  Pathway  of  Chemoprevention  by  Genistein 
using  Proteomics”  Susan  G.  Komen  Mission  Conference,  New  York,  NY,  2004. 

Rowell,  C.,  G.  Puckett,  K.  Roarty,  M.  Kirk,  L.  Wilson,  M.  Carpenter  and  C.  A.  Lamartiniere, 
“Serum  profiling  and  biomarker  discover  of  rat  mammary  tumors  using  mass-coded  abundance 
tags  (MCAT)”  95th  Annual  meeting  of  the  American  Association  for  Cancer  Research,  Orlando, 
FL,  2004. 

Rowell,  C.  and  C,A.  Lamartiniere.  From  Discovery  to  Validation:  Statistical  and  Biological 
evaluations  of  Proteomics  data.  Department  of  Mathematics  and  Statistics,  Auburn  University, 
Auburn,  AL  2005 

Rowell,  C.  and  C.A.  Lamartiniere.  Proteomic  Discovery  of  Genistein  Action  in  the  Rat 
Mammary  Gland.  Craig  Rowell  and  Coral  A,  Lamartiniere,  2005  AACR  meeting  in  San 
Francisco. 

Whitsett,  T.  and  Lamartiniere,  C.A.  Genistein  regulates  GRIP-1  in  the  rat  mammary  and  uterus. 
Presented  at  South  Central  Society  of  Toxicology  Meeting  in  Chattanooga  TN,  September,  2003. 

Whitsett  T,  Wang  J,  and  Lamartiniere  CA.  Steroid  coactivator  GRIP-1  regulation  with  genistein 
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in  the  rat  mammary  gland.  AACR  Annual  Meeting.  Proceedings,  Volume  45:661.  2004. 

Whitsett  T,  Wang  J,  and  Lamartiniere  CA.  Genistein  regulates  the  steroid  coactivator  GRIP-1  in 
the  rat  mammary  gland.  Society  of  Toxicology  43  Annual  Meeting.  Program  page  58.  2004. 

Whitsett  T  and  Lamartiniere  CA.  Breast  Cancer  Chemoprevention  with  the  Polyphenol 
Resveratrol.  Emerging  Topics  in  Breast  Cancer  and  the  Environment  Research.  2004. 

Whitsett  T  and  Lamartiniere  CA.  Breast  Cancer  Chemoprevention  with  the  Polyphenol 
Resveratrol.  Gordon  Research  Conference:  Hormone  Action  in  Development  and  Cancer.  July 
2005. 

Whitsett  T  and  Lamartiniere  CA.  Breast  cancer  chemoprevention  with  the  polyphenol 
resveratrol.  Society  of  Toxicology  44  Annual  Meeting.  Program  page  164.  March  2005. 


3)  Awards  to  Predoctoral  Students  (for  all  years) 

April  Adams:  AACR  Minority  Travel  Scholar  Award  in  Cancer  Research,  November  2004 

Damon  Bowe:  Merck  Toxicology  Externship,  Safety  Assessment  Division,  Merck  &  Co.,  West 
Point,  PA,  May  2005 

James  Cody:  Elected  Presiding  Officer  in  the  Molecular  and  Cellular  Pathology  graduate 
program  for  both  the  ’04-’05  and  ’05-’06  academic  years. 

Craig  Rowell:  Susan  Komen  Breast  Cancer  Predoctoral  Award  (DISS0201242)  Effects  of 
Genistein  and  TCDD  on  the  Maturation  of  the  Rat  Mammary  Gland:  Alterations  in  Protein 
Tyrosine  Kinase  Activity  and  Signaling. 

Craig  Rowell:  “AACR  Scholar  in  Training  Award”  Travel  award  for  the  2004  AACR  meeting 

Craig  Rowell:  1st  place  for  scientific  posters  sponsored  by  the  Breast  Cancer  and  the 
Environment  Research  Centers  (BCERC)  in  November  2004  in  Princeton  NJ 

Craig  Rowell:  Awarded  Graduate  Student  of  the  Year,  Department  of  Pharmacology  and 
Toxicology,  2005 

Tim  Whitsett:  Southeastern  Society  of  Toxicology  Poster  Award  (2003) 

Tim  Whitsett:  2nd  plasce  for  Emerging  Topics  in  Breast  Cancer  and  the  Environment  Research 
Poster  Award  (2004) 

Tim  Whitsett:  Susan  G.  Komen  Foundation  Travel  Scholarship  (2004) 

Tim  Whitsett:  Graduate  Student-Postdoctoral  Fellow  Conference  Award  (Gordon  Research 
Conference  2005) 
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Tim  Whitsett:  Society  of  Toxicology  Travel  Award  (2005) 

Tim  Whitsett:  DOD  Predoctoral  Training  Award  (BC043793)  Chemoprevention  Against  Breast 
Cancer  with  Genistein  and  Resveratrol.  2/25/05  -  2/25/08 

4)  Research  grants  received  in  part  because  of  preliminary  data  produced  by  Breast 
Cancer  predoctoral  students  (for  all  years) 


NIEHS  1R21  ES01 2326-01  (C.A.  Lamartiniere,  PI)  4/18/03  -  3/30/06 

First  Year:  $100,000;  Total:  $300,000 

In  Utero  TCDD  Programming  for  Mammary  Cancer:  Proteomic  analysis  of  mammary  gland 
from  rats  treated  in  utero  with  TCDD. 

DOD  DAMD  BC  17-03-1-0433  (C.A.  Lamartiniere,  PI)  7/1/03-7/31/06 

First  Year:  $150,000;  Total:  $428,249 

Proteomic  Analysis  of  Genistein  Mammary  Cancer  Chemoprevention:  Proteomic  analysis  and 
interstitial  fluid  analysis  of  mammary  glands  of  rats  treated  with  genistein. 

Center  for  Nutrient-Gene  Interaction  in  Cancer  Prevention.  NIH  NCI  P20  CA93753-02,  S. 
Barnes,  Center  Director.  Project  1.  Polyphenols:  Mammary  and  Prostate  Cancer 
Chemoprevention.  (C.A.  Lamartiniere,  C.A.,  P.I.).  $833,638.  6/1/03-9/30/08. 

Center  for  the  Study  of  Environment  and  Mammary  Gland  Development.  NIH/NIEHS.  1U01 
ES012771-01.  J.  Russo,  Fox  Chase  Cancer  Center,  Director;  Lamartiniere,  Co-PI. 

9/29/03  -  7-31-/10.  UAB  PI  share:  $1,540,000. 

NIH  1  R01  CA108585-01 A2  ,  Armed  Replicating  Ad  for  Breast  Cancer  Bone  Metastasis,  Joanne 
Douglas,  PI. 

Summary.  The  UAB  institutional  predoctoral  breast  cancer  training  grant  has  been  a  success  on 
this  campus.  It  has  catered  to  a  subset  of  focused  bright  young  students/researchers  that  are 
dedicated  to  investigating  the  cause,  chemoprevention  and  therapy  of  breast  cancer.  These  young 
researchers  are  being  trained  to  carryout  cutting  edge  breast  cancer  research.  While  the  BCTP 
has  been  in  existence  for  only  5  years,  we  have  graduated  one  Ph.D.  who  is  carrying  out  breast 
cancer  research  at  Duke  University.  Another  is  expected  to  graduate  with  his  Ph.D.  this  year. 
Then,  we  expect  6  more  to  graduate  within  the  following  2  years.  Overall  we  expect  9  Ph.D.s  in 
breast  cancer  research,  2  who  are  minorities.  We  are  optimistic  about  the  productivity  of  these 
students  based  on  the  short  term  published  and  submitted  manuscripts  and  the  abstracts  presented 
at  national/intemational  meetings.  Productivity  will  be  better  measured  in  the  coming  5  years. 

UAB  is  appreciative  of  the  opportunity  of  hosting  a  DOD  breast  cancer  training  program. 
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Modeling  Biological  Variability  in  2-D  Gel  Proteomic  Carcinogenesis 

Experiments 

Craig  Rowell,'  Mark  Carpenter,5  and  Coral  A.  Lamartiniere,  t  t 

Department  of  Pharmacology  and  Toxicology,  UAB  Comprehensive  Cancer,  University  of  Alabama  at 
Birmingham,  Birmingham,  Alabama  35294,  and  Department  of  Mathematics  and  Statistics, 

Auburn  University,  Auburn,  Alabama 

Received  May  3,  2005 

We  propose  a  statistical  method  to  model  the  underlying  distribution  of  protein  spot  volumes  in  2-D 
gels  using  a  generalized  model  (GM).  We  apply  this  approach  to  discover  mechanisms  of  chemical 
carcinogenesis  in  a  rodent  model.  We  generated  247  protein  spots  that  were  common  to  all  gels  ( n  — 
18).  Traditional  statistical  methods  found  6.5%  (13  out  of  247)  significant  protein  spots,  our  GM  approach 
yielded  a  total  of  53  (22.5%)  differentially  expressed  protein  spots. 

Keywords:  statistics  •  2-D  gels  •  proteomics  •  carcinogenesis  •  DMBA  •  rat 
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1.0.  Introduction 

Since  the  first  major  studies  using  two-dimensional  gel 
electrophoresis  (2-D  gels),  the  field  of  proteomics  has  under¬ 
gone  rapid  growth  and  development.1  Coupled  with  mass- 
spcctromctry  based  protein  identification,  2-D  gels  have  been 
viewed  by  scientists  as  a  tool  for  the  discovery  of  proteins  and 
pathways  in  numerous  systems.2'5  Progress  in  proteomics 
research  has  been  directly  related  to  the  availability  of  standard 
reagents,  protocols,  and  computer  programs  for  data  analysis 
(i.e„  Progenesis  and  PDQuest).5  These  improvements  have 
increased  the  number  of  treatment/comparison  groups  as  well 
as  the  number  of  biological  replicates  within  each  group  that 
can  be  examined.  In  addition,  better  imaging  and  processing 
software  allows  for  attention  on  proper  statistical  design  and 
analysis  of  experiments. 

Postrun  analysis  is  the  bottleneck  of  2-D  gel  experiments  due 
to  high  dimensional  data  likely  having  high  variability.7  Defi¬ 
ciencies  in  experimental  design  and  execution  greatly  impede 
postrun  analysis  and  decrease  the  overall  sensitivity  of  the 
technique.  Problems  related  to  analysis  first  arise  in  the 
software  processing  of  the  gels,  as  reported  in  Nishihara  and 
Champion."  These  results  point  to  the  issue  of  false  positive 
discovery  vs  accuracy  as  a  tradeoff  affecting  the  choice  of 
software  to  use.  Another  consideration  is  building  composite 
gels  to  increase  the  number  of  real  spots  to  analyze.  Central  to 
composite  gel  analysis  is  how  to  treat  absent  spots  (i.e., 
averaging  intensities  vs  choosing  a  “best-of"  analysis).  Technol¬ 
ogy'  such  as  CyeDyes  can  potentially  overcome  tills  problem, 
but  not  without  introducing  other  considerations.  Mauer  et  al. 
examined  2-D  gel  data  using  statistical  processes  inherent  in 
the  analysis  software  as  well  as  algorithms  applied  to  micro- 
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array  data.9  Recently,  Chang  et  al.  investigated  the  issue  of  spot 
normalization  (a  computer  generated  process)  to  address  the 
issue  of  missing  values  (spots  that  are  represented  in  the 
majority  but  not  all  of  the  gels  in  a  data  set).10  A  modeling 
procedure  used  by  Gustafsson  et  ai.  adjusts  for  variances  in 
spot  volume  data  by  applying  alternative  transformations." 
That  each  of  the  above  approaches  has  had  a  measure  of 
success  shows  there  are  numerous  approaches  for  evaluation 
of  2-D  gels. 

Much  of  2-D  gel  analysis  is  based  on  the  search  for  significant 
variation  between  the  means  (medians)  in  different  groups 
using  the  two-sample  t-test  and  analysis  of  variance  (ANOVA). 
The  assumption  is  that  the  populations  being  studied  are 
normally  distributed  with  constant  variances,  independent  of 
the  mean  expression  levels.  If  the  assumptions  are  violated, 
transformations  (i.e.,  log)  are  taken  to  make  the  data  more 
closely  conform  to  the  normal  distribution.  However,  this 
approach  has  produced  limited  success  because  the  transfor¬ 
mation  is  usually  taken  across  all  analysis  variables.  Gustafsson 
et  al.  noted  that  even  after  they  transformed  their  2-D  gel 
expression  data,  substantial  variance  heterogeneity  remained." 
So,  rather  than  manipulating  the  data  until  it  conforms  to  pre¬ 
constructed  assumptions,  wc  propose  to  model  the  data 
separately  for  each  protein. 

From  evolution  and  development  literature  we  borrow  the 
term  "standard  norms  of  reaction”  (NoR)  to  introduce  our 
modeling  process  of  2-D  gel  data.  Woltereck  introduced  the 
concept  of  NoR  to  represent  the  variation  of  phenotypic 
response  to  environmental  alterations  based  on  the  genotype 
of  the  organism.12-13  The  environmental  condition  In  our  current 
study  is  the  process  of  carcinogenesis.  In  this  study  the  same 
genetic  strain  of  animals  has  been  exposed  to  the  same 
environmental  insult  (dimethylbenzjn] anthracene,  DMBA).  We 
know  that  this  experiment  will  result  in  the  production  of 
mammary  tumors  in  all  treated  animals.  We  also  know  that 
the  timeline  of  palpable  tumor  development  is  variable  among 
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the  individual  animals;  therefore,  there  is  an  underlying  plastic¬ 
ity  in  the  phenotypic  response.14-'3  To  avoid  confounding  effects 
of  tumor  heterogeneity,  we  will  look  at  the  period  of  early  lesion 
formation."  In  general,  we  presume  that  changes  observed  at 
this  time  point  will  reflect  early  biochemical  events  related  to 
promotion.  It  is  our  goal  to  model  a  tissue  protein  signature(s) 
associated  with  early  cancer  formation. 

In  this  paper,  we  describe  the  importance  of  statistical  design 
and  analysis  when  conducting  investigations  using  2-D  gels  for 
differential  protein  expression  profiling.  A  series  of  experiments 
and  analyses  related  to  our  research  into  the  biochemical 
mechanisms  of  carcinogenesis  by  DMBA  in  a  rodent  mammary 
model  provide  the  data  We  propose  a  statistical  method 
whereby  the  underlying  distribution  of  spot  volume  is  modeled 
directly  as  a  generalized  distribution.  This  generalized  model 
(GM)  encapsulates  the  various  methods  of  transformations  and 
analyses  found  in  modern  proteoniic  literature.  The  GM 
method  will  therefore  yield  better  rates  of  discovery  than  more 
traditional  proteomic  statistical  analyses  and  better  reflect 
biological  changes  in  protein  expression. 

2.0.  Materials  and  Methods 

Sprague-Dawley  CD  rats  were  purchased  from  Charles  River 
Breeding  Laboratories  (Raleigh,  NC).  Dimcthylbcnz(«| an¬ 
thracene  (DMBA)  and  sesame  oil  were  purchased  from  Sigma 
Chemical  Company  (St.  Louis,  MO),  isoelectric  focusing  (IEF) 
strips,  IEF  buffer,  Multiphor  11,  tissue  grinding  kits,  and  albumin 
removal  kits  were  purchased  from  Amersham  Bioscicnces  (now 
a  member  of  GE  Healthcare,  Piscataway,  NJ).  All  other  chemi¬ 
cals  were  purchased  from  Fisher  Scientific  (Hampton,  NH). 
SyproRuby  and  the  VersaDoc  densitometer  were  purchased 
from  Bio-Rad  (Hercules,  CA).  SAS  v.10  was  purchased  from  the 
SAS  Institute  (Cary,  NC). 

2.1.  Pilot  Projects,  Replication  and  Power  Analysis.  One 
important  aspect  of  experimental  design  is  choosing  sample 
size.16  In  this  study,  we  promote  the  use  of  power  analysis  in 
determining  sample  size.  Other  issues  of  statistical  design  are 
die  elimination  of  extraneous  sources  of  variability  and  choos¬ 
ing  the  number  and  levels  of  comparison  groups.  Our  first 
consideration  is  the  choice  between  technical  and  biological 
replications. 

Technical  versus  Biological  Replication.  Using  technical 
(analytical)  replicates  over  biological  replicates  has  been  widely 
discouraged.17' 15  However,  Asrivatham  et  al.  stated  that  the  use 
and  investment  in  analytical  replicates  far  pilot  projects  is 
extremely  valuable  for  data  quality  control  and  validation  of 
the  2-D  gel  handling  process.20  Importantly,  the  general 
consensus  is  that  as  functions  of  cost  and  resources,  biological 
replicates  provide  considerably  more  scientific  information 
than  analytical  replicates. 

Power  and  Sample  Size.  Pilot  studies  were  conducted  to 
determine  optimal  sample  size  based  on  power  analysis.  In  our 
study,  the  power  estimate  and  sample  size  determinations 
involved  using  unique  uterine  samples  (biological  replicates) 
from  8  control-  and  8  genistein-  (a  phytoestrogen  found  in  soy) 
treated  rats.  The  variance  for  each  of  the  commonly  detected 
proteins  was  estimated  using  the  pilot  expression  data.  The 
variance  estimate  was  used  to  evaluate  sample  size  effects  for 
discovering  specific  protein  volume  fold-changes.  Rather  than 
basing  power  analysis  on  crude  family  wise  adjustments,  such 
as  Bonfcrroni.  we  designed  an  experiment  with  sufficient  power 
to  examine  at  least  one  single  protein  comparison  (in  our  study, 
the  power  analysis  was  based  on  adjustment  of  100  proteins) 


and  after  the  data  was  collected  we  computed  the  estimated 
false  discovery  rate  to  assess  the  potential  number  of  discover¬ 
ies. 

False  Discovery  Rates  (FDR).  Benjamini  and  Hochbcrg  first 
coined  the  phrase  "false-discovery  rate"  (FDR)  now  commonly 
applied  In  significance  testing  designed  for  high  dimensional 
biology.21-25  For  a  particular  experiment,  the  FDR  is  the 
expected  or  estimated  proportion  of  false  discoveries  out  of 
the  total  number  of  significantly  different  genes/proteins.  This 
means  that  a  large  FDR  of  50%  would  lead  die  researcher  to  a 
different  decision  with  respect  to  allocation  of  resources  than 
if  the  FDR  were  5%.  Therefore,  we  computed  the  estimated 
FDR  to  assess  die  potenrial  number  of  discoveries  after  the  data 
was  collected.23  24 

2.2.  Study  Design.  Animal  care  and  treatment  were  per¬ 
formed  according  to  established  guidelines  approved  by  the 
UAB  Animal  Care  Committee.  Eighteen  50-day-old  female 
Sprague-Dawley  rats  were  divided  into  two  groups  and  cither 
gavaged  with  40  fig  DMBA/g  B.  W.  (n  =  8)  or  gavaged  with  an 
equal  volume  of  vehicle,  sesame  oil  only  (n  =  10).  At  75  days 
of  age  (25  days  post  DMBA  treatment),  animals  were  anesthe¬ 
tized  with  Ketamine/xylazene  and  the  fourth  abdominal  mam¬ 
mary  glands  were  dissected.  We  selected  75  days  post  DMBA 
with  die  intention  of  investigating  mammary  glands  with  early 
preneoplastic  lesions  and  biochemical  alterations,  and  yet 
relatively  tumor  mass  free.  Each  gland  was  cut  in  half  longi¬ 
tudinally  to  allow  both  proteomic  as  well  as  pathological 
evaluations.  Frozen  mammary  tissues  were  homogenized  in 
lysis  buffer  formulated  for  2-D  gels  using  tissue  grinding  kits.20 
After  measuring  protein  concentration  via  Bradford’s  assay 
(Bio-Rad),  equal  concentrations  of  sample  were  subjected  to 
albumin  removal.  Protein  concentration  was  remeasured  and 
150  fig  protein  aliquots  were  diluted  in  rehydration  buffer.  The 
samples  were  applied  to  separate  immobilized  pH  gradient 
(IPG)  strips  (24  cm,  pH  4-7)  and  allowed  to  rehydrate  overnight 
at  room  temperature.  The  JPG  strips  were  placed  on  a  flatbed 
electrophoresis  unit  [Multiphor  II)  and  a  current  gradient 
applied  (500  V  for  1  h,  3500  V  for  1.5  h,  followed  by  3500  V  for 
22.5  h).  After  isoelectric  focusing,  IPG  strips  were  equilibrated 
first  in  100  mM  dithiothreitol  for  45  min  followed  by  equilibra¬ 
tion  in  120  mM  iodacetimide  for  45  min.  IPG  strips  were  loaded 
onto  pre-cast  1,5  mm,  12,5%  SDS  gels  and  run  on  a  Dodecacell 
vertical  electrophoresis  unit  according  to  manufacturer’s  sug¬ 
gestions.  Both  IEF  and  SDS  gels  were  run  as  block  groups 
consisting  of  equal  treatments  per  run.  Once  gels  were  run  to 
completion,  they  were  stained  using  SyproRuby  and  scanned 
via  a  VersaDoc  4000  Densitometer.  Spot  matching  and  gel 
warping  were  done  using  Progenesis  Discovery  2004.  Processed 
data  was  imported  into  SAS  version  10  and  analyzed  using 
statistical  methods  and  algorithms  based  on  various  SAS 
procedures. 

2.3,  Data  Processing.  For  our  experiments,  we  elected  to  use 
the  "total  spot  volume”  normalization  procedure  found  in  the 
Progenesis  software.  After  spot  matching  and  gel  warping  were 
completed,  the  data  file  was  exported  to  SAS  for  processing. 
For  all  the  following  procedures,  we  evaluated  only  spots  that 
were  common  to  all  gels  in  the  data  set.  The  first  step  in  our 
cleanup  procedure  was  to  perform  a  t-test.  Each  spot  identified 
as  significant  ( p  <  0.05)  was  located  and  the  spot’s  presence 
was  visually  confirmed  in  all  the  gels.  As  needed,  manual  rc- 
matching  of  spots  was  conducted  and  the  statistical  program 
was  re-run  to  generate  a  new  list  of  p-values  for  the  matched 
spots.  This  iterative  process  was  run  numerous  times  to  ensure 


B  Journal  ol  Proteome  Research 


Biological  Variability  in  Carcinogenesis  Experiments 


research  articles 


that  matches  reflected  high  quality  spots  (i.e.,  consistent  shape, 
nonsaturation  and  proper  splitting). 

2.4.  Statistical  Methods.  First,  we  applied  traditional  statisti¬ 
cal  approaches  to  differential  expression  analysis  and  their 
adaptations  to  assumption  violations.  Second,  since  in  pro- 
teomic  studies  it  is  not  uncommon  to  come  across  data  that 
are  nonnormatly  distributed  and/or  differently  dispersed,  we 
discuss  two  different  ways  of  dealing  with  these  situations.  In 
section  2.4.2„  we  describe  an  approach  that  we  refer  to  as  an 
indirect  method  where  traditional  statistical  analysis  is  con¬ 
ducted  on  the  transformed  data.  In  Section  2.4.3.,  we  describe 
our  direct  approach,  where  general  classes  of  distributions, 
(generalized  gamma,  exponential,  or  Wcibull)  arc  directly  fitted 
to  the  data  using  a  generalized  linear  model. 

2.4.1.  Differential  Protein  Expression.  For  a  given  2-D  gel 
experiment,  proteomic  differential  expression  analysis  describes 
the  process  of  conducting  multiple  hypothesis  tests,  one  for 
each  protein,  across  all  commonly  expressed  proteins.  In  the 
traditional  two-sample  t-test,  any  protein  resulting  in  a  p-value 
that  is  less  than  a  pre-specified  a  (i.e.,  0.05}  is  considered 
significant  and  that  protein  is  deemed  differentially  expressed. 
This  approach  must  be  implemented  with  caution,  because  the 
error  rate  is  fixed  only  for  one  specific  test  and  if  more  than 
one  hypothesis/protein  is  tested  then  the  error  rate  ac¬ 
cumulates  across  all  tests. 

Since  many  experiments  involve  the  comparison  of  two 
treatment  groups  and  since  the  approach  can  be  easily 
generalized  to  more  than  two-groups,  we  focus  our  attention 
on  the  two-population  comparison.  If  there  are  m  and  n2  gels 
processed  in  groups  1  and  2,  respectively,  and  the  populations 
are  assumed  to  have  approximately  equal  variances,  then  the 
two-sample  t-test  involves  the  computation  of  the  following 
test  statistic 


y/^U /«,  + 

Sp  =  [((«,  -  1)S,2  +  («2  -  I )*/)/(«,  +  «2  -  2)1  (1) 

where  Jr,  fc,  si,  and  s/  are  the  sample  means  and  variances 
from  each  sample,  respectively.  In  the  two-sample  t-test,  if  the 
normality  assumption  is  reasonable  but  the  common  variance 
assumption  is  violated  then  cq  1  may  not  be  valid.  However, 
approximate  t-tests  are  available  to  test  for  differences  between 
the  two  means.  Cochran  and  Cox  proposed  an  approximate 
t-test,  but  the  degrees  of  freedom  were  undefined  when  the 
sample  sizes  were  unequal  and  the  test  was  quite  conserva¬ 
tive.27'28  Satterthwaitc’s  approximation  for  the  degrees  of 
freedom  can  be  used  for  the  approximate  t-test  in  these  cases, 
but  die  test  given  below,  still  remains  conservative.28  ™ 

t  =  . . . =—=,  df=  (to,  +  u/2)2/{h/,2/(71,  -  1)  + 

ys,2/n,  +  s,2/ii2 

w2l(n2  -  l)),u>,  =  siilnl,w2  =  s2  I n2)2s':3a  (2) 

Regardless  of  the  test  used  (cqs  1  or  2)  there  is  a  question  of 
efficiency  since  one  usually  tests  the  equal  variance  assumption 
before  deciding  between  the  t-test  (equal  variances)  or  the 
approximate  t-test  (unequal  variances).  Nonparametric  ap¬ 
proaches  do  not  provide  much  relief  since  they  typically  assume 
symmetrically  distributed  populations  with  common  variance 
or  similar  shapes  across  groups. 

Since  a  2-D  gel  experiment  involves  several  hundred  hy¬ 
pothesis  tests  on  unknown  proteins,  it  is  impossible  to  know 


whether  the  equal  variance  assumption  is  true  for  all  proteins. 
On  the  basis  of  the  sample  expression  between  two  groups, 
one  can  test  whether  the  underlying  populations  have  the  same 
spread,  dispersioii  or  variance  to2  =  o22versus  oil  s*  a22)  using 
the  Folded  Form  F-test  (eq  3) 

F  =  max  (s,2,s22)/min  (s,2,s/)  (3) 

which,  under  the  normal  distribution  assumption,  has  an 
Fni-!,n2  i  distribution.  Although  most  researchers  conduct  such 
a  test  to  determine  whether  the  two-sample  t-test  based  on 
equal  variance  is  valid,  we  propose  that  proteins  that  have 
significantly  different  variances  between  groups  may  well  be 
of  biological  significance.  That  is,  if  a  protein  has  significantly 
different  variances  between  groups,  then  it  is  included  in  the 
list  of  significant  proteins  whether  there  are  significant  mean 
differences  in  expression. 

2.4.2.  Transformation  Approach.  In  genomic  and  proteomic 
studies  statistical  analyses  is  often  conducted  on  the  log- 
transformed  data  across  all  genes  and  proteins.  In  many  cases, 
this  approach  results  in  more  symmetrically  distributed  data 
and/or  dampens  the  effect  of  nonconstant  variance  at  high 
levels  and  outliers.  However,  Rockc,  and  Durbin  provided 
evidence  that  for  low  expressing  genes  or  proteins,  this 
transformation  can  make  matters  worse.1’1'12  Accordingly,  much 
literature  has  been  dedicated  to  more  generalized  transforma¬ 
tions  such  as  Box-Cox™  and  Generalized-log  transformations 
that  serve  as  an  alternative  to  blind  application  of  a  single 
transformation.32-3,1  Specifically,  if  y  represents  the  expression 
value  for  a  particular  protein  or  gene,  then  the  simple  Box- 
Cox  transformation  is  of  the  form  z  —  (y*  -  1)//  if  A  *  0  and  i 
—  iogty)  if  A  =  0.33  This  class  includes  most  of  the  common 
transformations,  including  the  log-transform  and  various  power 
and  inverse  power  tiansfonns.  The  underlying  goal  in  using  a 
generalized  transformation  is  that  the  resulting  data  will  be 
more  in  line  with  the  model  assumptions  and  therefore 
produce  more  robust  analyses  of  the  data.31-32,34  3"  The  general¬ 
ized  class  of  transformations  are  appealing  because  they  arc 
very  flexible.  They  include  a  form  of  the  simple  log-transform 
as  a  special  case,  and  the  appropriate  transformation  can  be 
estimated  using  maximum  likelihood  approaches.33  The  TRAN- 
SREG  procedure  in  SAS  offers  the  maximum  likelihood  ap¬ 
proach  in  fitting  the  optimal  Box-Cox  transformations  to  data 
taken  from  Draper  and  Smith.33  The  model  fitting  feature  allows 
one  to  optimize  and/or  customize  the  transformation  for  each 
individual  protein  or  gene  rather  than  doing  a  single  log- 
transform  across  all  proteins  or  genes, 

2.4.3.  Generalized  Model  (GM).  The  generalized  model  more 
directly  addresses  the  problems  discussed  above  by  providing 
a  unified  theoretical  and  conceptual  framework  for  analyzing 
protein  differential  expression  across  each  protein  spot.  Gen¬ 
eralized  models  assume  the  response  variable  (expression)  is 
not  necessarily  normally  distributed  and  the  underlying  dis¬ 
tributions  may  not  have  constant  variances  between  groups 
or  across  levels  of  the  predictor  variables.40  In  many  cases  other 
than  the  normal  distribution,  the  populations  may  have  a 
mathematical  dependency  (link  function)  between  the  variance 
and  the  mean  of  the  populations.  The  GENMOD  procedure  in 
SAS  provides  Newton-Raphson  algorithm  (ridge-stabilized)  to 
maximize  the  log-likelihood  function  in  estimation  and  testing 
of  parameters  in  the  model  for  a  broad  collection  of  models, 
including  the  normal,  inversc-Gaussian,  gamma,  negative 
binomial,  and  Poisson  distributions, 
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Therefore,  we  propose  a  method  for  2-D  gel  analysis  whereby 
the  underlying  distribution  is  modeled  directly  as  a  generalized- 
gamma  distribution,  which  has  the  Weibull,  exponential, 
gamma,  and  log-normal  as  special  cases.  Each  of  these  special 
distributions  has  a  relationship  with  a  log-Iocation-scale  family 
of  distributions.  For  example,  taking  the  log-transformation  of 
Weibull,  gamma,  and  log-normal  data  leads  to  the  extreme 
value  distribution,  the  log-gamma  distribution  and  the  normal 
distribution,  respectively.  Each  of  these  distributions  is  a  special 
case  of  the  generalized  log-gamma  distribution.41’  Therefore, 
under  the  right  conditions,  fitting  the  generalized  gamma  or 
the  generalized  log-gamma  distribution  to  data  leads  to 
distributions  approximating  the  true  underlying  distributions 
individually  and  perhaps  more  accurate  statistical  contrasts 
between  treatment  groups.  Inference  can  then  be  made  about 
the  location,  shape  and  scale  of  the  distribution  without  having 
prior  knowledge  of  the  specific  positive  support  distributions 
across  all  proteins  within  die  given  populations  or  treatment 
groups.  Therefore,  when  the  goal  is  discovery  of  proteins,  we 
propose  a  method  where  the  generalized  gamma  distribution 
is  fit  to  each  specific  commonly  expressed  protein  within 
populations  and  tested  for  significant  differences  across  the 
populations.  The  new  list  of  proteins  is  then  compared  and 
contrasted  to  diose  found  as  worthy  of  follow-up  analysis 
through  other  more  traditional  methods,  including  tests  on 
mean  and  variance  differences. 

Our  GM  method  is  expressed  as  follows:  Y  denotes  the 
expression  for  a  particular  spot  on  a  2-D  gel,  and  Y  has  a 
generalized  gamma  distribution  if  its  distribution  function  is 
of  the  form 

Ay)  =  — yr’"2  exp(-d'V),y  >  0  (4) 

y-  n<5  ~) 

where  F(-)  is  defined  as  the  gamma  function.  Taking  the 
log-transforin  of  a  generalized-gamma  random  variable,  z  = 
logfy)  results  in  the  location-scale  family  called  the  gencralized- 
log-gamma  distribution,  given  in  its  standard  form  as  follows: 

f[z)  = — ^-_(d  2  exp (5  -z))'V  3  exp(-<5:  exp(-d-z)), 
n<5  2) 

2,(5  €  ( — (5) 

The  parameter  d  is  referred  to  as  the  shape  parameter.  If  6  = 
1 ,  then  the  log-generalized  gamma  becomes  the  extreme  value 
distribution  and  the  corresponding  generalized  gamma  be¬ 
comes  die  Weibull  distribution.  If  d  =  0,  then  the  log- 
generalized  gamma  becomes  the  normal  distribution  and  the 
corresponding  generalized  gamma  becomes  the  log-normal 
distribution.  Regression  analysis  based  on  these  models  can 
be  done  by  using  the  LIFEREG,  NLP,  or  NUN  procedures  in 
SAS.  The  typical  approach  is  to  log-transform  the  data  first, 
and  then  fit  the  generalized  log-gamma  distribution  separately 
to  each  of  the  protein  expression  variables,  which  is  equivalent 
to  fitting  the  corresponding  generalized  gamma  to  the  raw  data, 
The  density  in  eq  5  is  expressed  in  standard  form  (just  as  a 
normal  distribution  with  mean  zero  and  unit  variance  is  the 
standard  form  of  the  normal  family).  As  Lawless  pointed  out, 
the  generalized  log-gamma  (GLG)  distribution  is  a  location- 
scale  family,  just  like  the  normal  family,  and  these  parameters 
are  introduced  into  the  density  by  letting  z  =  (u  -  /d  la,  and  u 
becomes  a  GLG  (m,o,(5),  where  ft,  a,  and  6  represent  the  location, 
scale  and  shape  parameters,  respectively.41 


Sample 


Figure  1.  Power  analysis  versus  sample  size.  This  graph  il¬ 
lustrates  how  power  and  sample  size  are  related  with  respect  to 
detection  of  fold  change  in  protein  expression. 

The  location  can  be  expressed  in  terms  of  linear  regression 
model  on  the  log- transformed  data.  Initial  estimates  of  regres¬ 
sion  parameters  are  obtained  by  doing  ordinary  least-squares 
regression  on  the  log-transformed  data,  which  are  then  used 
to  get  more  precise  maximum-likelihood  estimators  (MLE) 
using  some  numerical  method  such  as  ridge-stabilized  New- 
ton-Raphson  algorithm.  Differences  in  location  between  popu¬ 
lations  can  then  be  tested  directly  using  a  y-square  test  (Wald 
test).  The  GENMOD  or  NLP  procedures  could  be  used  to 
generalize  this  approach  for  simultaneously  testing  for  differ¬ 
ences  in  location  (mcan/median),  scale  (variances/standard 
deviation)  and  shape,  but  for  illustrative  purposes  in  this  paper 
we  focus  on  tests  for  differences  in  location  in  models  in  which 
the  mean  and  variance  arc  mathematically  related. 

Results 

3.1.  Power  Analysis/Sample  Size  Determination.  In  our  first 
pilot  study,  five  replicate  2-D  gels  from  a  single  uterine  sample 
were  used  to  examine  reproducibility.  The  results  showed  that 
the  total  number  of  protein  spots  per  gel  were  reasonably 
similar.  However,  when  we  looked  only  at  common  spots 
among  the  gels,  we  found  that  as  the  multiplicity  of  gels 
increased  there  was  a  significant  decrease  in  number  of 
common  spots  (unpublished  data).  This  is  consistent  with 
earlier  findings  reported  by  Voss  and  Haber).7 

A  second  pilot  study  using  uteri  from  8  control-  and  8 
genistein-treated  rats  also  showed  that  as  die  multiplicity  of 
geis  increased  there  was  a  significant  decrease  in  number  of 
common  spots.  Since  this  decrease  in  matched  spots  was  at 
similar  rates  between  groups,  it  indicated  a  lack  of  sample 
handling  bias.  The  pooled  control  estimate  of  standard  devia¬ 
tion  in  normalized  peak  intensities  was  used  to  determine  that 
a  sample  size  of  8  animals  per  treatment  group  would  be 
sufficient  to  detect  a  1. 5-fold-change  between  the  two  groups. 
This  change  was  detected  with  over  99%  power,  based  on  a 
two-sample  t-test  with  an  experiment-wise  level  of  significance 
of  p  <  0.05,  widi  adjustments  for  multiple  testing.  Figure  1, 
displays  the  power  curves  for  the  detection  of  four  different 
fold  changes  (1.2,  1.3,  1.4,  and  1.5).  The  power  is  defined  as 
the  probability  of  detecting  the  specified  fold-change  and  is 
displayed  over  sample  sizes  ranging  between  4  and  12.  While 
a  sample  size  of  8  gives  99%  power  to  detect  a  1 . 5-fold-change, 
this  power  drops  to  82%  to  detect  a  1.3-fold-change.  A  sample 
size  of  6  only  gives  28%  power  to  detect  a  1.3-fold-change. 
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Figure  2.  2-D  gel  profile  (A)  A  display  of  unsupervised  spot  detection  results.  (B)  A  display  of  those  spots  common  to  all  gels  in  the 
experiment. 


Figure  3.  Supervised  spot  evaluation.  Initial  evaluation  of  common  spots  is  based  on  the  p-value  from  a  two-sample  t-test.  The  graph 
shows  the  mean  value  |±SEM)  for  the  normalized  value.  Spots  are  ranked  according  to  p-values  and  all  spots  with  a  p  <  0.05  are 
subjected  to  visual  inspection  to  verify  consistency  of  spot  parameters.  This  figure  shows  that  while  the  t-test  on  the  normalized  value 
was  significant  (p  =  0.0431),  there  is  inconsistency  in  spot  detection. 


3.2.  Data  Quality  and  Processing.  Results  of  unsupervised 
matching  and  spot  detection  (Figure  2A)  demonstrate  the  need 
for  a  directed  process  of  image  cleanup  before  evaluation.  After 
initial  matching  we  focused  only  on  those  spots  found  in  all 
gels  (Figure  2B).  Common  spots  with  significant  p- values  (p  < 
0.05,  t-test)  were  subjected  to  visual  verification  to  ensure  both 
accuracy  of  matching  and  consistency  of  spot  boundary  (Figure 
3).  This  early  analysis  is  critical  to  prevent  improper  data 
interpretation.  Once  several  new  landmarks  have  been  estab¬ 
lished  the  matching  program  and  inspection  process  is  rerun. 
This  iterative  process  greatly  increases  the  efficiency  of  sub¬ 
sequent  evaluations  by  providing  well  matched  data  points  for 
the  more  robust  statistical  procedures. 

3.3.  Statistical  Analysis  of  Two  Experimental  Groups.  Our 
primary  data  set  was  generated  using  18  gels  representing 
unique  mammary  gland  samples  in  each  of  two  treatment 
groups  (10  control  and  8  DMBA  treated  rats).  Analysis  of  all  18 
gels  yielded  247  spots  that  were  present  in  every  gel.  These  247 
common  spots  were  subjected  to  statistical  differential  expres¬ 
sion  analysis.  Evaluation  of  the  data  using  only  the  t-test  on 
the  untransfomied  data  found  13  spots  to  be  significantly 
different  between  the  2  groups  [p  <  0.05)  (Figure  4A).  Testing 
of  the  log-transformed  data  yielded  a  total  of  15  spots  to  be 
significantly  different  (p  <  0.05)  (Figure  4B).  GM  calculations 
added  an  additional  11  spots  for  a  total  of  26  spots  that  were 
significantly  different  (p  <  0.05)  (Figure  4C).  Using  a  0.05  level 


of  significance,  the  estimated  FDR  was  0,20.  Therefore,  we 
expected  5  of  the  26  spots  found  using  the  GM  to  be  false 
positives. 

3.3.1.  Generalized  Models.  An  advantage  of  the  GM  proce¬ 
dure  is  that  it  allows  for  the  mean  and  variance  to  be  linked 
and  vary  simultaneously  between  groups.  Individual  data  plots 
for  three  spots  where  the  p-values  differed  for  the  t-test,  log- 
normalized  and  GM  are  presented  in  Figure  5.  For  each  protein 
spot  the  individual  data  points  of  mammary  glands  of  control- 
and  DMBA-treated  animals  are  graphed  to  show  the  variation 
for  the  log-normalized  data.  For  those  instances  where  the 
norm  and  log-normalized  data  are  not  significantly  different 
we  assume  that  the  mean  values  are  similar.  Graphs  in  Figure 
5A,B  demonstrate  that  while  the  means  are  similar,  the 
underlying  variation  of  expression  is  different.  Therefore,  using 
the  GM  we  model  this  variation  and  determine  the  spots  to  be 
significantly  different  (p  <  0.05). 

3.3.2.  Tests  on  Equal  Variances  (Folded  Form  F-test).  Figure 
6  illustrates  that  using  just  the  Folded  Form  F-test  (testing  only 
on  variance)  we  find  33  unique  proteins  not  captured  by  any 
of  the  other  tests.  Finally,  we  see  that  there  is  an  overlap  of 
only  3  spots  identified  as  being  significant  using  all  testing 
procedures.  The  field  graph  in  Figure  7  illustrates  all  247  protein 
spots  that  were  evaluated.  This  graph  reveals  the  overlap  of 
significantly  evaluated  spots.  Each  significant  spot’s  location 
is  based  on  either  differences  only  in  the  variance  as  a  function 
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Figure  4.  Comparison  of  traditional  and  GM  testing  procedures.  The  number  of  significant  spots  for  each  testing  procedure  is  represented 
below  the  dashed  line  (p  <  0.05).  (A)  The  t-test  applied  to  the  normalized  data  found  13  spots  to  be  significantly  different,  (B)  Means 
testing  on  the  logtransformed  data  found  15  spots  differentially  expressed  and  |C)  results  using  the  GM  captured  26  spots  as  differentially 
regulated. 
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Figure  5.  Consideration  of  variance.  Each  graph  displays  the  log-transformed  data  for  an  individualsample  in  either  group  (Control  or 
DMBA)  (A)  Results  of  the  two-sample  t-test  on  the  normal  or  logtransformed  data  for  spot  1290  are  not  significant  (p  >  0.05).  However, 
results  of  the  GM  show  a  highly  significant  (p  =  7.213  x  10_0S)  difference  in  variance  between  the  control  and  DMBA  groups.  (B)  For 
Spot  1471  results  of  the  two-sample  t-test  on  the  normal  or  log-transformed  data  are  not  significant  (p  >  0.05).  However,  the  GM 
found  a  highly  significant  (p  =  0.000  452  8)  difference  in  variance  between  the  control  and  DMBA  groups.  (C)  For  spot  1304,  the  results 
of  the  two  sample  t-test  were  significant  and  with  log  transformation  (p  <  0.05),  as  well  the  GM  was  significant  (p  =  0-001  866). 


of  the  same  mean,  or  variance  in  the  absences  of  similar  means. 
Finally,  spots  uniquely  found  significant  using  the  GM  proce¬ 
dure  are  distinguished  in  the  broad  field. 

4.  Discussion 

Proteomics  and  genomics  fall  under  the  general  heading  of 
systems  biology.  Systems  biolog)'  focuses  on  the  interaction  of 
all  molecular  components  including:  DNA,  RNA,  proteins, 
protein  interactions,  biomodules,  cells,  tissues,  etc.,  with  each 
of  these  components  having  their  own  individual  elements  {e.g., 
specific  gene  mcthylation  or  protein  post-translational  modi¬ 
fications).  A  systems  level  view  is  necessary  to  understand  the 
complex  dynamics  that  underlie  the  physiology  in  both  the 
normal  and  diseased  states.  Systems  biology  is  characterized 
by  a  synergistic  integration  of  theory,  computation,  and  experi¬ 
ment.42 

Advances  in  recent  technology  make  possible  the  large-scale 
application  of  proteomics  for  biomarker  discovery  in  cancer 


models  and  the  exploration  of  mechanisms  of  action  of  drugs. 
These  advances  result  in  the  ability  to  readily  run  reproducible 
2-D  gels  for  protein  separation  and  obtain  protein  identification 
using  mass  spectrometry  techniques,  such  as  MALDI-TOF. 
Software  programs,  such  as  Progencsis,  have  been  developed 
that  aid  the  researcher  in  evaluating  changes  in  protein 
expression  profiles  among  groups  and  between  samples. 
However,  these  programs  lack  substantial  statistical  analysis 
tools  to  help  researchers  determine  the  most  important  and 
persistent  changes  throughout  the  experiment.  Without  ad¬ 
equate  means  of  analysis  the  researcher  is  left  to  generate  a 
long  list  of  proteins  for  identification,  and  then  is  required  to 
use  a  hit-or  miss  strategy  for  further  analysis. 

The  2-D  gel  cleanup/spot  review  and  evaluation  cycle  has 
long  been  considered  the  bottleneck  of  2-D  gel  experiments. 
This  has  resulted  from  over  reliance  on  the  unsupervised 
matching  and  spot  evaluation  by  the  software  followed  by  an 
unscripted  procedure  for  cleanup  by  the  end-user.  Therefore, 


F  Journal  of  Protaome  Research 


Biological  Variability  in  Carcinogenesis  Experiments 


research  articles 


Figure  6.  Nested  collection  of  significant  spots.  The  GM  proce¬ 
dure  found  the  same  spots  as  the  traditional  t-test  as  well  as 
those  found  from  testing  on  the  log-transformed  data.  The 
number  of  unique  spots  using  the  Folded  Form  F-test  (s  =  33) 
are  demonstrated  by  the  green  circle.  Three  spots  were  found 
to  be  significant  regardless  of  testing  method. 

wc  have  developed  a  method  that  greatly  increases  the  speed 
of  this  process  by  providing  guidance  and  direction.  Through 
multiple  trials  we  have  determined  that  statistical  analyses  arc 
best  conducted  only  on  the  common  spots. 

Our  current  research  focuses  on  finding  biochemical  events 
that  indicate  tire  earliest  stages  of  breast  cancer  development. 
Using  an  animal  model  of  carcinogenesis,  we  developed  our 
evaluation  of  markers  along  a  known  timeline  of  tumor 
development.  The  DM  BA  model  we  chose  has  been  demon¬ 


strated  to  result  in  100%  mammary  tumor  incidence.  In  general, 
we  have  seen  that  DMBA  treatment  to  50  day  old  rats  results 
in  palpable  tumor  development  when  the  animals  are  100- 
120  days  old;  therefore  our  choice  to  evaluate  mammary  glands 
at  75  days  of  age  (25  days  after  DMBA  administration) 
represents  a  very  early  state  of  carcinogenesis.  Pathological 
examination  of  these  animals  showed  no  lesion  formation  in 
the  DMBA  treated  animals  at  day  75.  Given  that  cancer  is  a 
disease  process  with  a  long  developmental  period  we  acknowl¬ 
edge  that  the  earliest  stages  of  carcinogenesis  are  likely  marked 
by  subtle  alterations  in  protein  expression.  These  low  expres¬ 
sion  differences  are  one  reason  that  we  have  emphasized  power 
analysis  to  provide  information  about  our  lower  limits  of 
detection  in  2-D  gel  experiments. 

Power  analysis  is  a  standard  method  to  detennine  a  level  of 
sensitivity  for  value  change  (such  as  spot  volume  fold  change) 
as  a  function  of  the  sample  size.  In  any  biomedical  experiment, 
the  number  of  experimental  units  (sample  size)  should  be 
selected  to  maximize  the  probability  (power)  of  detecting  a 
predetermined  significant  difference  between  two  or  more 
treatments  (i.o.,  protein  fold  change).  By  addressing  the  issue 
of  sensitivity  from  the  beginning,  this  knowledge  can  be  applied 
to  help  determine  if  the  changes  in  expression  of  a  particular 
protein  make  logical  sense  for  the  given  experimental  design/ 
biology.  While  replication  studies  for  power  determination  can 
be  costly,  establishment  of  statistically  relevant  data  will  lead 
to  reduced  end-cost.  For  our  biological  model,  the  result  of 
power  and  sample  size  determination  established  our  ability 
to  confidently  identify  those  spots  that  differed  in  mean 
expression  by  1.5-fold  or  greater  with  a  reasonable  number  of 
biological  replicates.  However,  results  of  traditional  expression 
evaluation,  t-test  and  log  transformed  data,  only  identified  a 
finite  number  of  significant  spots  (s  =  13  and  s  =  16, 
respectively).  In  fact,  finding  13  to  16  spots  represents  only 
5-6%  of  the  total  evaluated  protein  spots  (s  =  247).  This  low 
value,  while  technically  accurate,  represents  a  level  of  finding 


Test  on  Equal  Variance 


Figure  7.  Field  graph  of  commonly  expressed  spots.  The  vertical  axis  represents  the  p-values  from  a  two-sample  t-test  conducted  on 
the  log-transformed  data  (shaded  area  represents  p  <  0,05).  The  horizontal  axis  represents  the  p-values  resulting  from  a  test  on  equal 
variances  among  the  groups  (shaded  area  represents  p  <  0.05).  Red  circles  depict  those  p-values  based  on  the  generalized  linear 
model  (GM)  that  were  significant. 
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not  unlikely  to  be  based  on  chance.  Therefore,  we  needed  to 
design  a  more  robust  approach  to  evaluate  our  data. 

As  mentioned  in  the  Introduction,  the  application  of  the  GM 
provides  additional  information  on  the  distribution  of  tire 
individual  data  points  for  a  particular  spot,  By  applying  the 
concept  of  NoH  to  our  evaluation  we  saw  alteration  in  the 
variance  levels,  either  tighter  regulation  or  dysregulation  for 
some  of  the  proteins  examined,  vvhile  the  mean  appears  similar 
(see  Figure  5).  Spread  of  variation  shows  the  natural  charac¬ 
teristic  of  the  model  to  allow  for  wide  fluctuations  in  the  normal 
circumstance,  or  the  inverse  that  certain  proteins  require  strict 
control  to  maintain  adequate  cellular  function.  By  evaluating 
the  changes  in  variance  of  expression  we  gain  insight  into  a 
level  of  control  that  may  be  involved  in  the  promotion  of 
carcinogenesis.  Since  the  result  of  2-D  gels  is  to  look  at  a  broad 
spectrum  of  proteins,  we  may  be  able  to  establish  patterns  of 
variance  alteration  and  determine  if  proteins  that  undergo 
positive  or  negative  shifts  in  expression  are  functionally  related 
to  one  another  or  the  disease  process. 

The  changes  of  any  particular  protein  over  the  course  of 
tumor  development  will  itself  alter  as,  in  the  case  of  mammary 
cancer,  the  underlying  cell  population  changes.11  Traditionally, 
tumorogenesis  is  measured  as  a  mean  time  to  tumor  develop¬ 
ment,  hence  we  have  to  use  mtdtiple  animals/group  to  get 
mean  to  first,  second,  etc.  tumor/rat  since  the  individual 
animal's  response  is  different.  Furthermore,  it  is  the  funda¬ 
mental  effect  of  treatments  such  as  cancer  promoters  or 
chcmopreventive  agents  to  alter  the  time  of  tumor  develop¬ 
ment.  However,  all  of  these  measures  ignore  the  individual 
response  or  the  general  group  response  unless  the  mean  levels 
are  significantly  different.  Ultimately,  we  appreciate  that 
underlying  alterations  involved  in  the  long  term  process  of 
carcinogenesis  will  likely  be  found  in  subtle,  yet  persistent, 
changes  in  cellular  signaling. 

It  is  well  recognized  that  the  value  for  an  individual  spot  on 
a  2-D  gel  does  not  necessarily  represent  an  absolute  measure¬ 
ment  for  the  concentration  of  a  protein.  For  this  reason,  we 
acknowledge  that  there  is  some  inherent  weakness  in  perform¬ 
ing  exhaustive  evaluations  of  spots  from  a  statistical  standpoint. 
It  is  our  assumption  that  investigators  are  willing  to  make 
certain  tradeoffs  in  data  quality  vs  time  and  future  evaluation. 
That  is  to  say,  any  mass-spectrometry  based  protein  identifica¬ 
tion  is  going  to  require  more  stringent  confirmation  procedures, 
such  as  immuno  techniques.  In  turn,  these  techniques  will 
allow  for  a  more  quantitative  assessment  of  changes  in  protein 
concentration,  it  is  our  intent  to  provide  more  information 
about  the  general  qualities  of  the  information  that  the  2-D  gel 
is  providing  and  to  help  guide  the  researcher  in  the  decision 
making  process  with  respect  to  which  spots  should  be  evalu¬ 
ated  first.  Therefore,  displays  such  as  Figure  7  provide  all  of 
the  Information  with  respect  to  what  model  resulted  in  a  spot 
being  found  significant.  This  system  alleviates  the  production 
of  laundry-lists  of  proteins  and  allow  for  directed  and  focused 
studies  of  particular  proteins/pathways  that  are  involved  in  the 
condition  under  study.  Therefore,  our  future  experiments  will 
be  designed  to  more  accurately  capture  data  related  to  the 
temporal  changes  wc  have  observed  to  better  establish  the  role 
of  identified  proteins. 

In  summary,  we  have  described  a  reproducible  and  statistical 
approach  to  the  use  of  2-D  gels  for  identification  of  biomarkers 
that  may  be  related  to  the  carcinogenesis  of  DMBA  in  the  rat 
mammary  gland.  These  methods  lend  well  to  the  discovery  of 
novel  new  proteins  and  identification  of  key  signaling  pathways 
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involved  in  cancer  causation.  Our  statistical  approach  involves 
empirical  determination  of  the  number  of  gels  required  to 
ensure  statistical  power  for  appropriate  evaluation.  In  general, 
the  approach  we  used  results  in  quickly  identifying  those 
proteins  that  meet  a  realistic  and  significant  change,  hut  is  also 
broad  enough  to  allow  the  unique  modeling  approach  of  die 
GM.  The  approach  that  we  have  outlined  is  what  we  consider 
to  be  discovery  proteomics.  Only  when  we  have  mass  spec¬ 
trometry  data  for  identification  do  we  consider  this  as  our 
preliminary  data,  not  as  conformational  or  primary  data. 
Experiments  can  then  be  designed  to  evaluate  the  validity  of 
identifications  including  the  previous  mention  of  more  specific 
techniques  of  quantification. 
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